Defining a Qlik Replicate task
In order to work with Compose, you first need to define a Qlik Replicate task that replicates the source tables from the source endpoint to a landing zone in the storage (defined as the target endpoint in the Replicate task). The landing zone should then be defined as the data source for the Compose project.
For information on which endpoints can be used in a Replicate task that lands data for Compose, see Supported hive distributions for Data Lake projects.
The steps below highlight the settings that are required when using Qlik Replicate with Compose. For a full description of setting up tasks in Qlik Replicate, please refer to the Qlik Replicate Help.
Prerequisites
When defining the Replicate task, make sure the following prerequisites have been met.
-
If the Landing Zone database supports append, it is recommended to select Sequence as the file format in the Replicate target endpoint settings and to set the Control Tables format (if available) to Text. This will improve performance by allowing Replicate to append to the file instead of creating a new file for every Change Data Partition.
If the above is not possible, then it is recommended to periodically delete files that are no longer required from the target directory. This will prevent files from amassing and degrading performance. This can be done automatically using Replicate's partition retention feature. For more information, see the Qlik Replicate Help.
- When Microsoft Azure HDInsight is defined as the Replicate target endpoint, you must set the endpoint's Target storage format to Sequence.
- When Oracle is defined as the source endpoint in the Replicate task, full supplemental logging should be defined for all source table columns that exist on the target and any source columns referenced in expressions.
When using live views, to ensure transactional consistency, it is recommended to turn off Speed partition mode in the Replicate task settings. When set to off, Replicate will close the partition only at the end of each transaction. This might require you to shorten the partition interval in order for the changes to be propagated to Compose in a timely manner. Shortening the partition interval might also require you to increase the partition cleanup frequency to prevent too many files from accumulating on the target and degrading performance.
For information about turning off Speed partition mode, setting partitioning intervals, and partition cleanup, see the Replicate Help.
Limitations and Considerations
-
Replicate allows you to define global transformations that are applied to source/Change tables during task runtime. The following global transformations, however, should not be defined (as they are not compatible with Compose tasks):
- Rename Change Table
- Rename Change Table schema
-
The Create target control tables in schema option in the Replicate task settings' Control Table tab is not supported.
-
As Compose requires a full-after image to be able to perform Change Processing, the following Replicate source endpoints are not directly supported (as they do not provide a full-after image):
-
SAP HANA (log based)
-
Salesforce
-
-
Compose does not support the JSON and XML data types. Therefore, columns that are usually created with these data types (by the Replicate target endpoint) should be created as STRINGs instead. This can be done automatically within Replicate using a data type transformation. For information on which target endpoints support JSON and XML data types as well as instructions on how to create a data type transformation, refer to the Replicate Help.
-
From Compose May 2022 SR 01, if you use Replicate November 2022 to land data in Databricks, only the Databricks (Cloud Storage) target endpoint can be used. If you are using an earlier supported version of Replicate, you can continue using the existing Databricks target endpoints.
Setting up the task
To define the task:
-
Open Qlik Replicate and in the New Task dialog, do one of the following:
- Open the Manage Endpoint Connections window and define a source and target endpoint. The target endpoint must be the Hive database where you want Compose to create the Storage Zone tables. For more information on supported endpoints, see Supported hive distributions for Data Lake projects.
- Add the endpoints to the Replicate task and then select which source tables to replicate.
-
This step is not relevant for Full Load only tasks. To facilitate Schema evolution in Compose, select the DDL History Control Table in the Task Settings’ Metadata|Control Tables tab. If you intend to scan all data sources (when performing schema evolution), then you must do this for ALL Replicate tasks that move data to the Landing Zone, even those with source endpoints that do not support schema evolution (e.g. Salesforce).
Information noteIf you want the DDL History Control Table to be updated with any new source tables that are added during the Replicate task, you must define Table Selection Patterns in Replicate's Select Tables window.
-
This step is not relevant for Full Load only tasks. In the Task Settings' Store Change Setting tab, make sure that Store Changes in is set to Change tables.
- This step is not relevant for Full Load only tasks. In the Task Settings’ Change Processing|Store Changes Settings tab, enable Change Data Partitioning.
- This step is not relevant for Full Load only tasks. In the Task Settings’ Metadata|Control Tables tab, select the Change Data Partitioning Control Table.
- This step is not relevant for Full Load only tasks. If a Primary Key in a source table can be updated, it is recommended to turn on the DELETE and INSERT when updating a primary key column option in Replicate's task settings' Change Processing Tuning tab. When this option is turned on, history of the old record will not be preserved in the new record. Note that this option is supported from Replicate November 2022 only.
-
Run the task.
Wait for the Full Load replication to complete and then continue the workflow in Compose as described in Adding and managing data warehouse projects .